Comparison of optical character recognition software

An OCR SDK is a software development kit for adding optical character recognition capabilities to forms processing applications, document imaging management systems, e-discovery systems and records management solutions.

In order to avoid the difficulties of incorporating OCR technology, some OCR SDKs contain a high number of APIs, support multiple operating systems and programming languages.

Here is a non-exhaustive comparison of optical character recognition software:

Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Notes
ABBYY FineReader 1989 11 2011 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 186[1] ? ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[2]
AnyDoc Software 1989 ? ? Proprietary No Yes No No No VBScript ? ? ? Works with structured, semi-structured, and unstructured documents.
CuneiForm/OpenOCR ? 12 2007 BSD variant No Yes Yes Yes Yes C/C++ Yes 28 Any printed font Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
ExperVision TypeReader & RTK 1987 7.1.170.1125 2010 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 17 2618 Won the highest marks in the independent testing performed by UNLV for X consecutive years (in 1994).[3]

The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine[4] but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats." [5]PC Magazine

OCRFORMS[6] 2009 11.10 2011 Proprietary No Yes No Yes No C/Python No Any language based on latin alphabet Printed and written latin fonts Features a complete GUI and has a command-line tool for batch processing. Propietary algorithms for OCR/ICR/OMR and advanced string correction technology
GOCR ? 0.47 2009 GPL No Yes Yes Yes Yes C ? ? ?
LEADTOOLS[7] 1990[8] 17 2010 Proprietary No Yes No No No various Yes 56[9] Any printed font Supports Latin, Asian, Arabic, and MICR character sets.[7] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[10] ICR (handwritten text recognition) is supported.[11]
Java OCR ? Java OCR 2010 {{{1}}} No Yes No No No ? ? ? ? Uses Java
Microsoft Office Document Imaging ? Office 2007 2007 Proprietary No Yes No No No ? ? ? ? Uses OmniPage
Microsoft Office OneNote 2007 2007 ? 2007 Proprietary No Yes No No No ? ? ? ?
Ocrad ? 0.20 2010 GPL Yes Yes Yes Yes Yes C++ Yes Latin alphabet ? Command line
OCRopus ? 0.3.1 2008 Apache No No No Yes No C++ and Lua ? ? ? Pluggable framework which can use Tesseract
OCRFeeder ? 0.7.7 2009 GPL No No No Yes No Python ? ? ? Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
OmniPage 2005 18 2011 Proprietary No Yes Yes No No C/C++/C#[12] Yes ? ? Product of Nuance Communications
PSI:Capture 1995 4.1 2011 Proprietary No Yes No No No C# No 99 Any printed font Scan, capture and extract data from business documents such as invoices, forms and correspondance and export images/data to over 50 different backend systems including Microsoft SharePoint.
Puma.NET ? ? ? BSD No Yes No No No C# Yes 28 Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
Readiris ? 12 Pro 2009 Proprietary No Yes Yes No No C++ Yes ? ? Product of I.R.I.S. Group of Belgium. Asian and Middle Eastern editions.
ReadSoft ? ? ? Proprietary No Yes No No No ? ? ? ? Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
RelayFax ? ? ? Proprietary No Yes No No No ? ? Many ? Converts faxed pages into editable document formats (doc, PDF, etc...).
Scantron ? Cognition ? ? Proprietary No Yes No No No ? ? ? ? For working with localized interfaces, corresponding language support is required.
SimpleOCR 2002 3.5 2008 Proprietary No Yes No No No ? ? ? ?
SmartScore ? ? ? Proprietary No Yes Yes No No ? ? ? ? For musical scores
Tesseract ? 3.01 2010 Apache No Yes[13] Yes Yes No C++, C ? 35+[14] ? Created by Hewlett-Packard; under further development by Google
Transym OCR ? 3.0 2008 Proprietary No Yes No No No C#, C/C++, VB, VB.NET Yes 11 ?
Zonal OCR ? ? ? Proprietary No Yes No No No ? ? ? ?
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Notes

References